On speaker adaptation of long short-term memory recurrent neural networks

نویسندگان

Yajie Miao

Florian Metze

چکیده

Long Short-Term Memory (LSTM) is a recurrent neural network (RNN) architecture specializing in modeling long-range temporal dynamics. On acoustic modeling tasks, LSTM-RNNs have shown better performance than DNNs and conventional RNNs. In this paper, we conduct an extensive study on speaker adaptation of LSTM-RNNs. Speaker adaptation helps to reduce the mismatch between acoustic models and testing speakers. We have two main goals for this study. First, on a benchmark dataset, the existing DNN adaptation techniques are evaluated on the adaptation of LSTM-RNNs. We observe that LSTMRNNs can be effectively adapted by using speaker-adaptive (SA) front-end, or by inserting speaker-dependent (SD) layers. Second, we propose two adaptation approaches that implement the SD-layer-insertion idea specifically for LSTM-RNNs. Using these approaches, speaker adaptation improves word error rates by 3-4% relative over a strong LSTM-RNN baseline. This improvement is enlarged to 6-7% if we exploit SA features for further adaptation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks

Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speakerdependent and -independent data representation...

متن کامل

Modeling speaker variability using long short-term memory networks for speech recognition

Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Considering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many sequence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LST...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Integration of remote sensing and meteorological data to predict flooding time using deep learning algorithm

Accurate flood forecasting is a vital need to reduce its risks. Due to the complicated structure of flood and river flow, it is somehow difficult to solve this problem. Artificial neural networks, such as frequent neural networks, offer good performance in time series data. In recent years, the use of Long Short Term Memory networks hase attracted much attention due to the faults of frequent ne...

متن کامل

Improving Speaker-Independent Lipreading with Domain-Adversarial Training

We present a Lipreading system, i.e. a speech recognition system using only visual features, which uses domain-adversarial training for speaker independence. Domain-adversarial training is integrated into the optimization of a lipreader based on a stack of feedforward and LSTM (Long Short-Term Memory) recurrent neural networks, yielding an end-to-end trainable system which only requires a very ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

On speaker adaptation of long short-term memory recurrent neural networks

نویسندگان

چکیده

منابع مشابه

ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks

Modeling speaker variability using long short-term memory networks for speech recognition

Speech Emotion Recognition Using Scalogram Based Deep Structure

Integration of remote sensing and meteorological data to predict flooding time using deep learning algorithm

Improving Speaker-Independent Lipreading with Domain-Adversarial Training

عنوان ژورنال:

اشتراک گذاری